Prediction of Protein Secondary Structures with a Novel Kernel Density Estimator
نویسندگان
چکیده
* To whom correspondence should be addressed. E-mail: [email protected], Tel: +886-2-33664888 ext. 431, Fax: +886-2-23688675. Abstract Though prediction of protein secondary structures has been an active research issue in bioinformatics for quite a few years and many approaches have been proposed, a new challenge emerges as the sizes of contemporary protein structure databases such as the Protein Data Bank (PDB) continue to grow exponentially. The new challenge concerns how to effectively exploit the huge amount of structural information deposited in large protein structure databases and deliver ever-improving accuracy as the sizes of the databases continue to grow. This new challenge is addressed in this article by resorting to a kernel density estimation based approach. The kernel density estimator proposed in this article is distinctive in that the pointwise MSE (mean square error) of its basic form converges at O(n -2/3 ) regardless of the dimension of the vector space, where n is the number of instances in the training dataset. In addition, just like many conventional kernel density estimators, it features average O(nlogn) time complexity for generating the approximation function. The experimental results show that with the novel kernel density estimator the proposed predictor has been able to outperform the state-of-art predictors currently available. Experimental results further reveal that prediction accuracy delivered by the proposed predictor will continue to increase in the future as the size of the protein structure database keeps growing.
منابع مشابه
Physicochemical Position-Dependent Properties in the Protein Secondary Structures
Background: Establishing theories for designing arbitrary protein structures is complicated and depends on understanding the principles for protein folding, which is affected by applied features. Computer algorithms can reach high precision and stability in computationally designing enzymes and binders by applying informative features obtained from natural structures. Methods: In this study, a ...
متن کاملComparison of the Gamma kernel and the orthogonal series methods of density estimation
The standard kernel density estimator suffers from a boundary bias issue for probability density function of distributions on the positive real line. The Gamma kernel estimators and orthogonal series estimators are two alternatives which are free of boundary bias. In this paper, a simulation study is conducted to compare small-sample performance of the Gamma kernel estimators and the orthog...
متن کاملAsymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data
Kernel density estimators are the basic tools for density estimation in non-parametric statistics. The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in which the bandwidth is varied depending on the location of the sample points. In this paper, we initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...
متن کاملA Berry-Esseen Type Bound for the Kernel Density Estimator of Length-Biased Data
Length-biased data are widely seen in applications. They are mostly applicable in epidemiological studies or survival analysis in medical researches. Here we aim to propose a Berry-Esseen type bound for the kernel density estimator of this kind of data.The rate of normal convergence in the proposed Berry-Esseen type theorem is shown to be O(n^(-1/6) ) modulo logarithmic term as n tends to infin...
متن کاملThe Relative Improvement of Bias Reduction in Density Estimator Using Geometric Extrapolated Kernel
One of a nonparametric procedures used to estimate densities is kernel method. In this paper, in order to reduce bias of kernel density estimation, methods such as usual kernel(UK), geometric extrapolation usual kernel(GEUK), a bias reduction kernel(BRK) and a geometric extrapolation bias reduction kernel(GEBRK) are introduced. Theoretical properties, including the selection of smoothness para...
متن کامل